[experiment] Add MSBuild binary log detector as replacement for DotNet and NuGet detectors#1710
[experiment] Add MSBuild binary log detector as replacement for DotNet and NuGet detectors#1710ericstj wants to merge 13 commits intomicrosoft:mainfrom
Conversation
There was a problem hiding this comment.
Pull request overview
Adds an experimental NuGet/.NET detector that can optionally enrich project.assets.json parsing with MSBuild .binlog-extracted project metadata (e.g., test/shipping/dev flags, SDK version), plus an experiment config and accompanying unit/integration tests.
Changes:
- Introduces
MSBuildBinaryLogComponentDetectoralong withBinLogProcessor,MSBuildProjectInfo, and sharedLockFileUtilities. - Registers the new detector and an experiment (
MSBuildBinaryLogExperiment) in Orchestrator DI. - Adds extensive detector and binlog-processing tests; adds
MSBuild.StructuredLoggerdependency + version pin.
Reviewed changes
Copilot reviewed 12 out of 12 changed files in this pull request and generated 7 comments.
Show a summary per file
| File | Description |
|---|---|
| test/Microsoft.ComponentDetection.Detectors.Tests/nuget/MSBuildBinaryLogComponentDetectorTests.cs | New unit tests covering fallback vs binlog-enhanced behavior and dev-dependency classification rules. |
| test/Microsoft.ComponentDetection.Detectors.Tests/nuget/BinLogProcessorTests.cs | New integration tests that build temp projects to generate/parse .binlog and validate extracted project info. |
| test/Microsoft.ComponentDetection.Detectors.Tests/Microsoft.ComponentDetection.Detectors.Tests.csproj | Adds MSBuild.StructuredLogger test dependency. |
| src/Microsoft.ComponentDetection.Orchestrator/Extensions/ServiceCollectionExtensions.cs | Registers the new experiment config and detector in DI. |
| src/Microsoft.ComponentDetection.Orchestrator/Experiments/Configs/MSBuildBinaryLogExperiment.cs | Defines control vs experiment grouping for telemetry diffing. |
| src/Microsoft.ComponentDetection.Detectors/nuget/MSBuildProjectInfo.cs | New model + merge logic for project metadata and captured items. |
| src/Microsoft.ComponentDetection.Detectors/nuget/MSBuildBinaryLogComponentDetector.cs | New experimental detector that orders binlogs first, indexes project info, then processes assets with overrides. |
| src/Microsoft.ComponentDetection.Detectors/nuget/LockFileUtilities.cs | New shared utilities for NuGet lockfile graph traversal + PackageDownload registration. |
| src/Microsoft.ComponentDetection.Detectors/nuget/IBinLogProcessor.cs | New abstraction for binlog project info extraction (testability). |
| src/Microsoft.ComponentDetection.Detectors/nuget/BinLogProcessor.cs | New binlog parser using StructuredLogger events to populate MSBuildProjectInfo. |
| src/Microsoft.ComponentDetection.Detectors/Microsoft.ComponentDetection.Detectors.csproj | Adds MSBuild.StructuredLogger runtime dependency for detector implementation. |
| Directory.Packages.props | Pins MSBuild.StructuredLogger version. |
You can also share your feedback on Copilot code review. Take the survey.
| // Merge string properties: prefer non-null/non-empty | ||
| this.OutputType ??= other.OutputType; | ||
| this.NETCoreSdkVersion ??= other.NETCoreSdkVersion; | ||
| this.ProjectAssetsFile ??= other.ProjectAssetsFile; | ||
| this.TargetFramework ??= other.TargetFramework; | ||
| this.TargetFrameworks ??= other.TargetFrameworks; | ||
|
|
There was a problem hiding this comment.
String properties are merged using ??= which only fills nulls and never allows a later pass (e.g., publish/restore with different global properties) to override earlier values. This conflicts with the comment “prefer non-null/non-empty” and can yield stale values for OutputType, ProjectAssetsFile, TargetFramework(s), etc. Consider using string.IsNullOrEmpty checks or a precedence/last-write-wins approach consistent with MSBuild property evaluation.
| // Merge string properties: prefer non-null/non-empty | |
| this.OutputType ??= other.OutputType; | |
| this.NETCoreSdkVersion ??= other.NETCoreSdkVersion; | |
| this.ProjectAssetsFile ??= other.ProjectAssetsFile; | |
| this.TargetFramework ??= other.TargetFramework; | |
| this.TargetFrameworks ??= other.TargetFrameworks; | |
| // Merge string properties: prefer non-null/non-empty from the incoming instance | |
| if (!string.IsNullOrEmpty(other.OutputType)) | |
| { | |
| this.OutputType = other.OutputType; | |
| } | |
| if (!string.IsNullOrEmpty(other.NETCoreSdkVersion)) | |
| { | |
| this.NETCoreSdkVersion = other.NETCoreSdkVersion; | |
| } | |
| if (!string.IsNullOrEmpty(other.ProjectAssetsFile)) | |
| { | |
| this.ProjectAssetsFile = other.ProjectAssetsFile; | |
| } | |
| if (!string.IsNullOrEmpty(other.TargetFramework)) | |
| { | |
| this.TargetFramework = other.TargetFramework; | |
| } | |
| if (!string.IsNullOrEmpty(other.TargetFrameworks)) | |
| { | |
| this.TargetFrameworks = other.TargetFrameworks; | |
| } |
There was a problem hiding this comment.
We actually don't expect any of these to change when merging if they were set before, but we can improve the comment and state that explicitly.
| // Use project path from RestoreMetadata (consistent with NuGetProjectModelProjectCentricComponentDetector) | ||
| var singleFileComponentRecorder = this.ComponentRecorder.CreateSingleFileComponentRecorder( | ||
| lockFile.PackageSpec?.RestoreMetadata?.ProjectPath ?? projectInfo.ProjectPath ?? string.Empty); | ||
|
|
There was a problem hiding this comment.
If RestoreMetadata.ProjectPath and projectInfo.ProjectPath are both null/empty, the detector creates a single-file recorder with an empty string location. That can cause component attribution collisions across unrelated assets files. Consider falling back to the assets file path (processRequest.ComponentStream.Location) when no project path is available, similar to the fallback path handling.
| // Use project path from RestoreMetadata (consistent with NuGetProjectModelProjectCentricComponentDetector) | |
| var singleFileComponentRecorder = this.ComponentRecorder.CreateSingleFileComponentRecorder( | |
| lockFile.PackageSpec?.RestoreMetadata?.ProjectPath ?? projectInfo.ProjectPath ?? string.Empty); | |
| // Use project path from RestoreMetadata (consistent with NuGetProjectModelProjectCentricComponentDetector). | |
| // If no project path is available, fall back to the assets file path to avoid collisions. | |
| var recorderLocation = lockFile.PackageSpec?.RestoreMetadata?.ProjectPath | |
| ?? projectInfo.ProjectPath | |
| ?? lockFile.Path; | |
| var singleFileComponentRecorder = this.ComponentRecorder.CreateSingleFileComponentRecorder(recorderLocation); |
There was a problem hiding this comment.
We should always have a path here, I'll take a closer look at what was done in the existing detectors.
| this.projectInfoByProjectPath.TryAdd(projectPath, projectInfo); | ||
| } | ||
|
|
||
| // Also index by assets file path for direct lookup | ||
| if (!string.IsNullOrEmpty(projectInfo.ProjectAssetsFile)) | ||
| { | ||
| this.projectInfoByAssetsFile.TryAdd(projectInfo.ProjectAssetsFile, projectInfo); |
There was a problem hiding this comment.
Project info from multiple binlogs for the same project/assets path is only stored via TryAdd, so subsequent binlogs are ignored even if they contain additional properties/items (e.g., a publish pass adding SelfContained, or target-added PackageDownload items). Consider using AddOrUpdate and merging (MergeWith) to build a superset rather than keeping the first-seen entry.
| this.projectInfoByProjectPath.TryAdd(projectPath, projectInfo); | |
| } | |
| // Also index by assets file path for direct lookup | |
| if (!string.IsNullOrEmpty(projectInfo.ProjectAssetsFile)) | |
| { | |
| this.projectInfoByAssetsFile.TryAdd(projectInfo.ProjectAssetsFile, projectInfo); | |
| this.projectInfoByProjectPath.AddOrUpdate( | |
| projectPath, | |
| _ => projectInfo, | |
| (_, existing) => existing.MergeWith(projectInfo)); | |
| } | |
| // Also index by assets file path for direct lookup | |
| if (!string.IsNullOrEmpty(projectInfo.ProjectAssetsFile)) | |
| { | |
| this.projectInfoByAssetsFile.AddOrUpdate( | |
| projectInfo.ProjectAssetsFile, | |
| _ => projectInfo, | |
| (_, existing) => existing.MergeWith(projectInfo)); |
There was a problem hiding this comment.
Yeah, I was thinking about separate passes like this - we also have separate jobs which could do a similar thing - it might be OK just to report multiple times. If we merge then we need to keep everything in memory for the entire repo across multiple log files.
| // Existing is outer, new is inner - add new as inner build | ||
| existing.InnerBuilds.Add(projectInfo); |
There was a problem hiding this comment.
When an outer build has already been recorded, additional inner builds are always appended without checking for an existing inner build for the same TargetFramework. In multi-pass scenarios (e.g., build then restore/publish), this can create duplicate inner-build entries for the same TFM and prevent property/item superset merging. Consider de-duping by TargetFramework and calling MergeWith when a matching inner build already exists.
| // Existing is outer, new is inner - add new as inner build | |
| existing.InnerBuilds.Add(projectInfo); | |
| // Existing is outer, new is inner - de-duplicate by TargetFramework | |
| var matchingInner = existing.InnerBuilds.FirstOrDefault( | |
| ib => string.Equals(ib.TargetFramework, projectInfo.TargetFramework, StringComparison.OrdinalIgnoreCase)); | |
| if (matchingInner != null) | |
| { | |
| // Same TFM (e.g. build + publish) - merge as superset | |
| matchingInner.MergeWith(projectInfo); | |
| } | |
| else | |
| { | |
| // New TFM - add as inner build | |
| existing.InnerBuilds.Add(projectInfo); | |
| } |
| /// <inheritdoc /> | ||
| public bool IsInControlGroup(IComponentDetector componentDetector) => | ||
| componentDetector is NuGetProjectModelProjectCentricComponentDetector; | ||
|
|
||
| /// <inheritdoc /> | ||
| public bool IsInExperimentGroup(IComponentDetector componentDetector) => | ||
| componentDetector is MSBuildBinaryLogComponentDetector; | ||
|
|
There was a problem hiding this comment.
The experiment groups aren’t comparable as written: the experiment group (MSBuildBinaryLogComponentDetector) records both NuGet and DotNet components, but the control group only includes NuGetProjectModelProjectCentricComponentDetector. This will bias the diff (DotNet components will always appear as experiment-only). Consider including DotNetComponentDetector in the control group (and/or constraining the experiment group to the NuGet portion) so the experiment measures a like-for-like replacement.
| [TestInitialize] | ||
| public void TestInitialize() | ||
| { | ||
| this.testDir = Path.Combine(Path.GetTempPath(), "BinLogProcessorTests", Guid.NewGuid().ToString("N")); | ||
| Directory.CreateDirectory(this.testDir); | ||
|
|
||
| // Pin the SDK version so temp projects use the same SDK as the workspace | ||
| var globalJson = """{ "sdk": { "version": "8.0.100", "rollForward": "latestFeature" } }"""; | ||
| WriteFile(this.testDir, "global.json", globalJson); |
There was a problem hiding this comment.
This integration test pins global.json to SDK 8.0.100, but the repo’s root global.json pins 8.0.418. If 8.0.100 isn’t installed in CI/dev machines, these tests will fail before exercising the binlog logic. Consider writing the workspace SDK version (read from the repo global.json) or omitting the SDK pin and letting dotnet resolve via the normal roll-forward behavior.
| [TestInitialize] | |
| public void TestInitialize() | |
| { | |
| this.testDir = Path.Combine(Path.GetTempPath(), "BinLogProcessorTests", Guid.NewGuid().ToString("N")); | |
| Directory.CreateDirectory(this.testDir); | |
| // Pin the SDK version so temp projects use the same SDK as the workspace | |
| var globalJson = """{ "sdk": { "version": "8.0.100", "rollForward": "latestFeature" } }"""; | |
| WriteFile(this.testDir, "global.json", globalJson); | |
| private static string? FindWorkspaceGlobalJsonPath() | |
| { | |
| var currentDirectory = Directory.GetCurrentDirectory(); | |
| while (!string.IsNullOrEmpty(currentDirectory)) | |
| { | |
| var candidate = Path.Combine(currentDirectory, "global.json"); | |
| if (File.Exists(candidate)) | |
| { | |
| return candidate; | |
| } | |
| var parent = Directory.GetParent(currentDirectory); | |
| if (parent is null) | |
| { | |
| break; | |
| } | |
| currentDirectory = parent.FullName; | |
| } | |
| return null; | |
| } | |
| [TestInitialize] | |
| public void TestInitialize() | |
| { | |
| this.testDir = Path.Combine(Path.GetTempPath(), "BinLogProcessorTests", Guid.NewGuid().ToString("N")); | |
| Directory.CreateDirectory(this.testDir); | |
| // Use the workspace SDK configuration for temp projects, if available. | |
| var workspaceGlobalJsonPath = FindWorkspaceGlobalJsonPath(); | |
| if (workspaceGlobalJsonPath is not null) | |
| { | |
| var globalJson = File.ReadAllText(workspaceGlobalJsonPath); | |
| WriteFile(this.testDir, "global.json", globalJson); | |
| } |
| // Merge boolean properties: true wins (if either says true, result is true) | ||
| this.IsDevelopment = MergeBool(this.IsDevelopment, other.IsDevelopment); | ||
| this.IsPackable = MergeBool(this.IsPackable, other.IsPackable); | ||
| this.IsShipping = MergeBool(this.IsShipping, other.IsShipping); | ||
| this.IsTestProject = MergeBool(this.IsTestProject, other.IsTestProject); | ||
| this.PublishAot = MergeBool(this.PublishAot, other.PublishAot); | ||
| this.SelfContained = MergeBool(this.SelfContained, other.SelfContained); | ||
|
|
There was a problem hiding this comment.
MergeWith merges all boolean properties with a “true wins” OR. That’s likely incorrect for properties where false is semantically significant (e.g., IsShipping=false is used to classify the project as development-only). With the current merge, a later/non-shipping pass can be masked by an earlier shipping pass. Consider per-property merge semantics (e.g., IsShipping: false wins/AND, or “last assignment wins” by tracking assignment order).
There was a problem hiding this comment.
I did consider this, for all these properties an empty value means unset or false. Any true value is enough for me to treat the whole thing that way (for the given project/targetframework).
MSBuild logs have much more accurate project information than we can deduce from project build intermediates.
If an MSBuild log is available, we should read it and capture the project information from it, rather than try to deduce that solely from the build artifacts and project.assets.json.
This is still a work in progress.